Biclustering with Background Knowledge using Formal Concept Analysis

نویسندگان

  • Faris Alqadah
  • Joel S. Bader
  • Rajul Anand
  • Chandan K. Reddy
چکیده

Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate background knowledge and expertise from the user. To this end, query-based biclustering algorithms have been recently developed in the context of microarray data; these algorithms utilize a set of seed genes provided by the user to prune the search space and guide the biclustering algorithm. In this paper, a novel Query-Based Bi-Clustering algorithm, QBBC, is proposed via a new formulation that combines the advantages of low-variance biclustering techniques and Formal Concept Analysis. We prove that statistical dispersion measures that are order-preserving induce an ordering on the set of biclusters in the data. In turn, this ordering is exploited to form query-based biclusters in an efficient manner. Our novel approach provides a mechanism to generalize query-based biclustering to sparse highdimensional data such as information networks and bag of words. Moreover, the proposed framework performs a local approach to query-based biclustering as opposed to the global approaches that previous algorithms have employed. Experimental results indicate that this local approach often produces higher quality and precise biclusters compared to the state-of-the-art query-based methods. In addition, a performance evaluation illustrates the efficiency and scalability of QBBC compared to full biclustering approaches and other existing query-based approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query-based Biclustering using Formal Concept Analysis

Biclustering methods have proven to be critical tools in the exploratory analysis of high-dimensional data including information networks, microarray experiments, and bag of words data. However, most biclustering methods fail to answer specific questions of interest and do not incorporate prior knowledge and expertise from the user. To this end, query-based biclustering algorithms that are rece...

متن کامل

Mining Biclusters of Similar Values with Triadic Concept Analysis

Biclustering numerical data became a popular data-mining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address...

متن کامل

Extraction de biclusters à valeurs similaires avec l’analyse de concepts triadiques

Biclustering numerical data became a popular datamining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address ...

متن کامل

Enumerating all maximal biclusters in numerical datasets

Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which can not find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general fa...

متن کامل

From Triconcepts to Triclusters

A novel approach to triclustering of a three-way binary data is proposed. Tricluster is defined in terms of Triadic Formal Concept Analysis as a dense triset of a binary relation Y , describing relationship between objects, attributes and conditions. This definition is a relaxation of a triconcept notion and makes it possible to find all triclusters and triconcepts contained in triclusters of l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012